Author Profiling with Doc2vec Neural Network-Based Document Embeddings
نویسندگان
چکیده
To determine author demographics of texts in social media such as Twitter, blogs, and reviews, we use doc2vec document embeddings to train a logistic regression classifier. We experimented with age and gender identification on the PAN author profiling 2014–2016 corpora under both singleand cross-genre conditions. We show that under certain settings the neural network-based features outperform the traditional features when using the same classifier. Our method outperforms existing state of the art under some settings, though the current stateof-the-art results on those tasks have been quite weak.
منابع مشابه
An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation
Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings. Despite promising results in the original paper, others have struggled to reproduce those results. This paper presents a rigorous empirical evaluation of doc2vec over two tasks. We compare doc2vec to two baselines and two state-of-the-art document embedding me...
متن کاملNeural Document Embeddings for Intensive Care Patient Mortality Prediction
We present an automatic mortality prediction scheme based on the unstructured textual content of clinical notes. Proposing a convolutional document embedding approach, our empirical investigation using the MIMIC-III intensive care database shows significant performance gains compared to previously employed methods such as latent topic distributions or generic doc2vec embeddings. These improveme...
متن کاملA Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure
Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...
متن کاملContext Aware Document Embedding
Recently, doc2vec has achieved excellent results in different tasks (Lau and Baldwin, 2016). In this paper, we present a context aware variant of doc2vec. We introduce a novel weight estimating mechanism that generates weights for each word occurrence according to its contribution in the context, using deep neural networks. Our context aware model can achieve similar results compared to doc2vec...
متن کاملNeural Endorsement Based Contextual Suggestion
This paper presents the University of Amsterdam’s participation in the TREC 2016 Contextual Suggestion Track. In this research, we have studied a personallized neural document language modeling and a neural category preference modeling for contextual suggestion using available endorsements in TREC 2016 contextual suggestion track phase 2 requests. Specifically, our main aim is to answer the que...
متن کامل